Paper Similarity Detection Method Based on Distance Matrix Model with Row-Column Order Penalty Factor

نویسندگان

  • Jun Li
  • Yaqing Han
  • Junshan Pan
چکیده

Paper similarity detection depends on grammatical and semantic analysis, word segmentation, similarity detection, document summarization and other technologies, involving multiple disciplines. However, there are some problems in the existing main detection models, such as incomplete segmentation preprocessing specification, impact of the semantic orders on detection, near-synonym evaluation, difficulties in paper backtrack and etc. Therefore, this paper presents a two-step segmentation model of special identifier and Sharpley value specific to above problems, which can improve segmentation accuracy. In the aspect of similarity comparison, a distance matrix model with row-column order penalty factor is proposed, which recognizes new words through search engine exponent. This model integrates the characteristics of vector detection, hamming distance and the longest common substring and carries out detection specific to near-synonyms, word deletion and changes in word order by redefining distance matrix and adding ordinal measures, making sentence similarity detection in terms of semantics and backbone word segmentation more effective. Compared with the traditional paper similarity retrieval, the present method has advantages in accuracy of word segmentation, low computation, reliability and high efficiency, which is of great academic significance in word segmentation, similarity detection and document summarization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determination of weight vector by using a pairwise comparison matrix based on DEA and Shannon entropy

The relation between the analytic hierarchy process (AHP) and data envelopment analysis (DEA) is a topic of interest to researchers in this branch of applied mathematics. In this paper, we propose a linear programming model that generates a weight (priority) vector from a pairwise comparison matrix. In this method, which is referred to as the E-DEAHP method, we consider each row of the pairwise...

متن کامل

Detection of Fake Accounts in Social Networks Based on One Class Classification

Detection of fake accounts on social networks is a challenging process. The previous methods in identification of fake accounts have not considered the strength of the users’ communications, hence reducing their efficiency. In this work, we are going to present a detection method based on the users’ similarities considering the network communications of the users. In the first step, similarity ...

متن کامل

Row/Column-First: A Path-based Multicast Algorithm for 2D Mesh-based Network on Chips

In this paper, we propose a new path-based multicast algorithm that is called Row/Column-First algorithm. The proposed algorithm constructs a set of multicast paths to deliver a multicast message to all multicast destination nodes. The set of multicast paths are all of row-first or column-first subcategories to maximize the multicast performance. The selection of row-first or column-first appro...

متن کامل

CBIR using Combined Feature Vectors of Column-Wise and Row-Wise DCT Transformed Plane Sectorization

Content Based Image Retrieval is a way of computer viewing technique used to retrieve digital images from a huge database. In this paper we have first calculated the feature vector column-wise and row-wise separately. After this we have concatenated the feature vectors of column-wise and row-wise. To evaluate the performance of the proposed method we have used Precision-Recall crossover point, ...

متن کامل

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Multimedia

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014